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A SANOV-TYPE THEOREM FOR EMPIRICAL MEASURES ASSOCIATED 
WITH THE SURFACE AND CONE MEASURES ON t SPHERES 


By Steven Soojin Kim and Kavita Ramanan 

Brown University 

We prove a large deviations principle (LDP) for the empirical measure of 
the coordinates of a random vector distributed according to the surface mea¬ 
sure on a suitably scaled sphere in R", as n —>■ oo. This LDP is established 
for p £ with respect to the i^-Wasserstein topology, for every q < p. 

We prove the result by first establishing an analogous LDP when the random 
vector is distributed according to the cone measure on the scaled £P sphere. In 
addition, we combine our LDP with the Gibbs conditioning principle to ob¬ 
tain an asymptotic probabilistic description of the geometry of an £P sphere 
under certain £‘^ norm constraints, for q < p. These results are also of rele¬ 
vance for the study of the large deviations behavior of random projections of 
£P balls. 


1. Introduction. Given a sequence of independent and identically distributed (i.i.d.) 
random variables (X„)„gN sampled from a probability measure /i on M, a classical result in 
probability theory is Sanov’s theorem, which states that the associated sequence of empiri¬ 
cal measures where 


L„ = -f^dx„ nGN, 

(with 5x representing the Dirac mass at a G M) satisfies a large deviations principle (LDP) 
with rate function given by the relative entropy with respect to jj. (the precise definitions of 
an LDP, relative entropy, and other relevant notions are recalled in Section 2.1 and Section 
2 . 2 ). 

The goals of this paper are to establish an analog of Sanov’s theorem for empirical mea¬ 
sures arising in a geometric context, and to establish a related conditional limit theorem. 
Specifically, in Theorem 2.5 (resp.. Theorem 2.8), for p G we prove an LDP for 

fhe sequence (L„_p)„gN, where L„ p is fhe empirical measure of fhe coordinates of a ran¬ 
dom vecfor fhaf is disfribufed according fo fhe surface measure (resp., fhe cone measure) 
on a suifably scaled iP sphere in M”, and show fhaf fhe associafed rale funclion lakes fhe 
form of a perlurbed relalive enlropy funclional. We poinf ouf fhaf fhis LDP holds in fhe 
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g-Wasserstein topology for every q < p. As summarized in Remark 2.4, we crucially utilize 
the fact that the LDP holds in the i^-Wasserstein topology, which is stronger than the usual 
weak topology, in order to prove some interesting geometric consequences of the LDP for 
{L„^p)neN- For example, as elaborated below, we apply the LDP to establish the conditional 
limit result of Theorem 2.12. Consequently, Corollary 2.13 provides a precise version of the 
following (roughly stated) asymptotic probabilistic description of an sphere under an 
norm constraint: for <7 < p, in high dimensions, a random point on the sphere conditioned 
on having small norm is close (in the sense of distribution) to a random point drawn from 
an appropriately scaled sphere. 

In contrast to the i.i.d. setting of the classical Sanov’s theorem, the coordinates i = 

1 ,... ,n, of a random vector distributed according to the surface measure (or cone measure) 
of the iP unit sphere in M" are dependent because is constrained to lie in S„.p. Note 
that LDPs have previously been established for empirical measures of random variables 
with other dependency structures. For example, see [9, § 6 .3-6.6] for methods which apply to 
Markov chains or stationary sequences satisfying certain mixing conditions. Alternatively, 
in random matrix theory, a large deviations principle for the empirical spectral measure (for 
which there is strong interaction among the eigenvalues) can be found in [3]. However, these 
settings do not include the dependency structure of induced by the surface measure 
on the IP sphere. 

A further motivation for our study arises from the fact that the empirical measure LDP 
of this paper will be applied in [ 12 ], in the proof of a variational formula relating certain 
“quenched” and “annealed” large deviations rate functions for random projections of iP 
balls. 

In addition, we believe that our main question is of intrinsic interest, motivated by geo¬ 
metric considerations. Indeed, we are interested in such LDPs due to the classical mantra 
that LDPs (as opposed to just large deviation bounds) yield not only the asymptotic expo¬ 
nential rate of decay of the probability of a rare event, but also the most likely way in which 
such a rare event can occur. This general idea has been realized to great effect in statistical 
mechanics, where large deviations theory can describe the most probable state of a system 
of particles under an energy constraint (see, e.g., the surveys [4, 11]). Of particular util¬ 
ity is the so-called “Gibbs conditioning principle”, which transforms an LDP for empirical 
measures into a statement about the most probable behavior of the underlying sequence 
of random variables, conditional on a rare event. A central motivation for our work is to 
employ such a conditional probabilistic perspective in a geometric setting, by investigating 
how LDPs can inform the analysis of “geometric” rare events in high dimensions. One ob¬ 
stacle to this goal is that it is not a priori clear which rare events should be conditioned upon 
in order to obtain a meaningful result. In this work, we demonstrate that one natural exam¬ 
ple of a geometric rare event is a deviation of the norm, for ^ < p. At a high level, our 
second set of results. Theorem 2.12 and Corollary 2.13, analyze a geometric rare event by 
giving an asymptotic description of the surface measure on a high-dimensional £p sphere, 
conditional on a sufficiently small norm. 

To be more concrete, first recall the classical i.i.d. setting (see, e.g., [7, 8 , 22]): Sanov’s 
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theorem and the associated Gibbs conditioning principle state that under mild assump¬ 
tions, conditional on a large deviation of L„, the empirical measure of n i.i.d. random 
variables, the joint law of A: = o{n) of the variables is asymptotically (as n —> oo) close 
to the ^-fold product of a certain “exponentially tilted” distribution, under which the large 
deviation conditioning event is in fact typical. For a concrete example, suppose 
are i.i.d. exponential random variables with mean 1; then, conditional on the rare event 
{S^xLn{dx) = > j8} for some j8 > 1, the joint law of (Xi,... ^Xk) converges in 

total variation as n —oo to that of of k i.i.d. exponential random variables with mean j3. 

In our geometric setting of iP spheres, we identify a suitable class of rare events to 
condition upon, and apply the Gibbs conditioning principle to obtain Theorem 2.12, which 
addresses the question of “how” an unlikely geometric event occurs (with fixed k instead 
of k = o{n), and weak convergence instead of total variation convergence). The primary 
novel consequence in this setting is that Corollary 2.13 lends a geometric meaning to our 
exponentially tilted conditional limit law. 

The remainder of the paper is organized as follows. In Section 2, we recall some basic 
definitions and state our main results. In Section 3, we prove Theorem 2.8, our LDP result 
in the case of the cone measure. In Section 4, we show how the preceding result implies 
Theorem 2.5, our LDP result in the case of the surface measure. Lastly, in Section 5, we 
prove the conditional limit results. Theorem 2.12 and Corollary 2.13. 

2. Preliminaries and main results. 


2.1. The sequence of empirical measures. Let p G [1,°°] and x G M”. Denote the iP 
norm of x by ||x||„^p = (L”^i for p <°°, and l|x||„,oo = max,=i |x,j for p = oo. 

Note that || • ||„.2 denotes the usual Euclidean norm on M”. Let §„ p = {x G M” : ||x||„^p = 1} 
be the unit iP sphere in M”. Let 23(M”) denote the Borel sigma-algebra of M", and let vol„(-) 
denote Lebesgue measure on (M”,23(M”)). Let area„.p(-) denote the (unnormalized) surface 
measure on S„ p; in other words, if we let S„,p = {S = S' n§„.p : S' G ^(R")}, then for 


SgS 


n,pi 


area: 


„,p(S) = lim 
' el-O 


vol„({x-H :xG5,||t||„,2 < £» 

2e 


We consider the following natural probability measure on S„,p. 


Definition 2.1. 
defined as 


Let a„,p denote the (normalized) surface measure on S„,p, which is 




area„p(S) 


S G 


Note that for p = 2, the surface measure a „,2 is the unique rotation invariant probability 
measure on S„ 2, the {n — 1)-dimensional unit sphere in R". 

Suppose all random variables we introduce are supported on a common probability space 
(fljTjP). For a random variable ^ and a measure p, we write ~ /r if the law of is /i; 
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that is, if P o § ^ For « € N and p G [1,°°], let .. ,xh"'^^) be an n- 

dimensional random vector, and let L„,p be the empirical measure of the coordinates of 

^llpX{n,p)^ 

1 " 

( 2 - 1 ) = - X • 

“ i=i 

We aim to prove a large deviations principle for the sequence (L„_p)„gN under the assump¬ 
tion that ~ a„,p. To (heuristically) understand why we scale the vector by a 

factor of uFp in the definition of L„,p, note that lies on the unit sphere in M”; thus, 
each coordinate x/”’^^, i= 1,... ,n is approximately of order so each sealed coordi¬ 

nate „i/PX.(«,p) 

is approximately of order 1, which turns out to be the appropriate order of 
magnitude to analyze large deviations of empirical measures. 

2.2. Background on large deviations. We recall below the definition of a large devia¬ 
tions principle. 


Definition 2.2. Let X be a topological space equipped with its Borel a-algebra. The 
sequence of probability measures {ix„)nm C T’(X) is said to satisfy a large deviations prin¬ 
ciple (LDP) with a rate function I: X —[0,°°] if I(-) is lower semicontinuous, and for all 
Borel measurable sets T C X, 

— inf I(x) < liminf2logu„(r°) < limsup ^logmdr) < —infl(x), 

xeT° ” jcef 

where r° and T denote the interior and closure of T, respectively. Furthermore, I is said 
to be a good rate function if it has compact level sets. Similarly, we say the sequence of 
X-valued random variables (i§„)„gN satisfies an LDP if the sequence of laws given 

by /r„ = P o satisfies an LDP. 


For a broad review of large deviations, we refer to [9]. In particular, suppose X^”) ~ /i®” 
(the n-fold product measure) for some jj. G T’(M). Equivalently, suppose the coordinates 
x/”\/ = 1 ,...,« of X^”^ are independent and identically distributed (i.i.d.) with common 
distribution p. Then, Sanov’s theorem states that the sequence of associated empirical mea¬ 
sures L„ = S („) satisfies an LDP with good rate function H{-\\p), where H is the 

n X- 

relative entropy, 


( 2 . 2 ) 


H{v\\p) 


/xlog(|^)^/v ifv</i, 

-|-oo else. 


where v <C ft denotes that V is absolutely continuous with respect to p. 

Before we state the LDP and associated rate function in our setting, we first set some 
notation and definitions. For ^ G [1, °°), let niq : T’(]R) —> be the map that takes a measure 
to its ^-th absolute moment; that is, for v G T(M), 

fnq{v) = / |v|^v(^i^x). 

t/M. 


(2.3) 
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For q = °°, let 


wioo(v) = > 0 : v{[—a,aY) = 0}. 


Our LDP will hold with respect to the so-called Wasserstein topologies on 1P(M) introduced 
below. We refer to [25, §6] for an extensive review of the Wasserstein topology. 

Definition 2.3. Let qe [1,°°]> ^^d let 


y^(M) = {ft G y W : rriqin) < oo}. 


A sequence of probability measures (ftn)ngN G IP^(K) is said to converge to /i G with 
respect to the Wasserstein-q topology (or with respect to W^) if /r„ jX (converges weakly) 
and mq{iJLn) —)• m^(/i) < oo as n —> oo. 

Remark 2.4. We consider the ^-Wasserstein topology on probability measures (in¬ 
stead of, e.g., the weak topology or the t topology) because we must consider a topology 
that is weak enough to allow an LDP to hold, but at the same time strong enough to al¬ 
low certain moments to be continuous functionals of measures. In particular, we require a 
topology strong enough such that the moment map niq is continuous for q < p, which is 
used in the proofs of both the Gibbs conditioning result of Theorem 2.12 and the variational 
formula of [12] in the context of random projections of balls. 

Let Up denote the generalized Gaussian distribution with location 0, shape p, and scale 
p^^P. That is, for p G [1,°°), 



(2.4) 


The case p = 2 corresponds to the standard Gaussian distribution. For p = °°, define 


l^{dy) = jl[-i,i]{y)dy, yGM. 


As for the rate function itself, for p G[1 ,oo], let Hp : T’(M) ^ [0,°°] be a version of the 
relative entropy with respect to Pp, perturbed by some p-th moment penalty: 



(2.5) 


where we take the convention ^ = 0. 

2.3. Main results. Our first main result is as follows: 

Theorem 2.5 (LDP under the surface measure). Let p G [l,oo] and assume ~ 
a„.p. For q < p, the sequence of empirical measures {Ln^p)neN of (2.1) satisfies an LDP in 
T’p(M) equipped with the Wq topology, with the convex good rate function Mp of (2.5). 
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The proof of Theorem 2.5 is deferred to Section 4. Our approach is to first prove an LDP 
under a related measure on p called the cone measure, defined as follows. 


Definition 2.6. Lef Yn,p denofe fhe cone measure on 


Yn,p (5) 


vol„ {{cx :x €S,c € [0,1]}) 

VOln (B^ p) 


SgS 




Remark 2.7. The cone measure corresponds fo the surface measure on §„ p if and 
only if p = 1,2, or oo. See [19, §3] and [17, §3] for more extensive discussions. 

Then, we establish the following LDP for the sequence of empirical measures (L„.p)„gpf 
when is distributed according to the cone measure 7 „,p. 

Theorem 2.8 (LDP under the cone measure). Let p G and assume ~ p. 
Forq < p, the sequence of empirical measures {Ln^p)neN of (2.1) satisfies an LDP in Tg(M) 
equipped with the Wg topology, with the convex good rate function Hp of (2.5). 


The proof of Theorem 2.8 is given in Section 3. 


Remark 2.9. The Sanov-type LDP of Theorem 2.8 complements existing Glivenko- 
Cantelli-type LLN (law of large numbers) and Donsker-type CLT (central limit theorem) 
for the empirical measure L„ p under the cone measure [21, Theorem 1]. 

The analysis of the LDP for (L„ p)„gf^ when is distributed according to the cone 

measure is facilitated by a certain probabilistic representation for the cone measure. For 
example, it is well known that if Z is an n-dimensional standard Gaussian random variable, 
then Z/||Z||„ 2 is uniformly distributed on the Euclidean sphere S„. 2 - More generally, we 
have the following: 

Lemma 2.10 ([19, §3] and [20, Lemma 1]). Fix n G N and p G [1,°°]. Suppose ~ 
7 „,p, and let ^ Then, 


( 2 . 6 ) 


■^(n,p) (£) 


||k(”-T)||„,p 


This representation allows us to exploit the underlying independence structure of the 
cone measure and leverage results from classical large deviations theory in the proof of 
Theorem 2.8. 

Our next set of results provide insight into the most probable asymptotic behavior of 
X^"'P\ conditioned on a certain class of rare events. Before stating our results, recall the 
following Poincare-Maxwell-Borel type result for the asymptotic independence induced by 
Yn,p und Ofi^p- 
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Lemma 2.11. Fix p G and ^ G N. For n G N such that k <n, suppose that either 
~ Ofi p orX^"'P^ ~ Y„ p . Then, as oo, 




The preceding lemma is classical in the case p = 2. For general p G [1,°°], Lemma 2.11 
is due to Theorem 4.1 and Theorem 4.4 of [19] (in the case of the cone measure) and [16] 
(in the case of the surface measure). In addition, Theorem 3 and Theorem 4 of [18] offer a 
simplification of the proof for both the cone and surface measure. In fact, the results we cite 
are stated in the form of a finite n bound on the total variation distance, but the asymptotic 
weak convergence result stated in Lemma 2.11 is all we will need in the sequel. 

Our next result yields a “conditional” version of the preceding lemma. Let h{-) denote 
the differential entropy of a probability measure v G T’(M), 



(2.7) 


Differential entropy arises naturally in the analysis of Hp, since for measures v G !P(M) that 
are absolutely continuous with respect to Lebesgue measure and satisfy nip{v) < 1, 



h{v) + ^mp(v) + log(2p‘/Pr(l + j)) + j(l-/np(v)) 
h(v) +Cp, 


( 2 . 8 ) 


for a finite constant Cp that depends only on p, but not on v. Using the LDP of Theorem 
2.8, we can obtain the following conditional limit theorem, which involves a constrained 
maximum entropy problem. 

Theorem 2.12 (Conditional limit theorem). Fix p G [1,°°] and suppose that either 
X^"’P^ ~ o„^p or X^^'P'i ~ 7 „ p. Fix a closed interval C = [cx,p] C M, and for e > 0, let 
Cg = [a — e, j8 + e]. Then, for q < p, the optimizing measure 


(2.9) 


V* = argmax{/j(v) : mp{v) < l,m^(v) G C} 


is well defined (i.e., exists and is unique), and 


( 2 . 10 ) 


lim lim P(L„ „ G • |m.(L„ „) G Cg) = Sy,- 
e-s-0 n—s-M 


Moreover, for k G N, 


( 2 . 11 ) 



as n ^ oo followed by e 
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The proof of Theorem 2.12 is given in Section 5. 

Note that the conditioning event of (2.11) is merely a restatement of the conditioning 
event of (2.10). That is, for any q,p ^ [l,oo], we have the identity 

(2.12) = m,(L„,p). 

^ i=i 


We now describe an interesting application of Theorem 2.12 that admits a precise geo¬ 
metric interpretation. Specifically, we show that in high dimensions, when a random sample 
from the surface measure of a (suitably scaled) sphere is conditioned on having a suffi¬ 
ciently small norm (for q < p), then it behaves like a sample from the surface measure of 
a corresponding sphere. That is, the particular conditioning operation specified in (2.14) 
induces a probabilistic change fhat admits a geometric interpretation. A precise statement 
is given below. First, define for q,p e[i 


(2.13) 




q/p 


Roughly speaking, the constant ^ is chosen such that for p <Pp,^ and the interval C = 
[0, j8], the variational problem of (2.9) has a solution with a natural geometric interpretation. 
Also, note that this constant pp ^ is “small”, in the sense that if ^ < p, then pp ^ < 1; we refer 
to Remark 5.4 for why the following result of Corollary 2.13 is stated only for “small p”. 
Note that the variational problem of (2.9) does have an explicit solution even for intervals 
C = [0,j8] where p > pp q. For further discussion of this issue, we refer to Remark 5.5 for 
why the “large p” setting is less interesting, and to Remark 5.6 for some comments on an 
“intermediate p” setting. We now state the aforementioned application of Theorem 2.12, 
which applies in the “small p” regime. 


Corollary 2.13. Fix p G [1,°°) and suppose that either ~ On,p or ~ 
Fix q < p and p < pp^q, and let Xn,q = denote the law of the k coordinates 

assuming either ~ or ~ Furthermore, for 
e > 0, let ^n.p\q,e = %p\qE the conditional law, 

(2.14) = P (n'/^(Af’^\... ,Ai"’^)) G • 


i\\n^/PxM\\‘i^^q<p + £ 


Lastly, let p be a metric which metrizes the topology of weak convergence of probability 
measures (e.g., Levy-Prohorov, bounded Lipschitz). Then, we have 


lim lim p 

e— 



A, 


'n,p\q,£ 


= 0 . 


We prove Corollary 2.13 in Section 5. 
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To clarify the scaling in Corollary 2.13, note that is the law of the k 

coordinates /p conditioned on the event c ^n,e}, where 

= {y G M" : ^ I|y||^,^ < P + £}• 

Conditional on the rare event that lies in A„ e, we intuitively expect that for large 

n, the conditional law ^n.p\q,e will be close to a measure that concentrates on A„ g. This 
intuition motivates the introduction of the distribution On,q (or Yn,q) and the scaling 
which induce a distribution that concentrates on A„ g. That is, it is immediate from the 
definition of and Yn,q that for all n G N and P-a.s., = j8. It follows 

that G A„,g) = 1 for all £ > 0. 


3. LDP under the cone measure. Throughout this section, fix p G [1, °°] and for n G N, 
lef L„^p be defined as in (2.1) wifh ~ p. Also, lef ~ /i®”, where /Tp is as defined 
in (2.4), and lef denofe fhe empirical measure of 

“ /=1 ' 


In view of fhe represenfafion in Lemma 2.10, our proof of Theorem 2.8 consisfs of fhe 
following sfeps: 

1. wrife L„,p as a mapping of L^p and ifs p-fh absolufe momenf mp(L^p), and show fhaf 
fhis mapping is continuous wifh respecf fo fhe weak topology (Lemma 3.1); 

2. prove a (joinf) LDP for (L^p,mp(L^p)) in T’(M) x M_|_ (Lemma 3.2 and Lemma 3.3); 

3. apply fhe confracfion principle fo obfain fhe desired LDP for (L„ p)„gN in fhe weak 
topology (Proposition 3.4); 

4. show convexify of Hp (Lemma 3.6); 

5. exfend fhe LDP fo fhe Wassersfein fopology, and conclude fhe proof. 

To begin wifh, lef Gp : CP(]R) x —)■ T(M) be defined as 

(3.1) Gp(v,c) = v(-xci/^). 

Note fhaf by Lemma 2.10, for S G 23(M), 




“ /=1 




II Ik,g 


(S) = 


1 

n 


n 


i=\ ‘ 





Rewriting fhis in terms of fhe p-fh momenf nip defined in (2.3), we have 


(3.2) Lf, p — Gp {L^ pjtnpiLy^ p)^. 

When nof ofherwise specified, we equip (P(M) wifh fhe weak fopology, R+ wifh fhe Eu¬ 
clidean fopology, and fP(]R) x M+ wifh fhe induced producf fopology. 
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Lemma 3.1. The map Gp : IP(M) x M+ —)• 1P(M) o/ (3.1) is continuous. 

Proof. It is sufficient to prove sequential continuity since the product topology is metriz- 
able, and thus, sequential. Fix (v,c) G J’(]R) x 1R+ and a convergent sequence of prob¬ 
ability measures v„ => V and positive reals c„ c. For n G N, let ~ v„ and ~ v. 
Then, by Slutsky’s theorem, c„ A- Since c„ ~ v„(- x and, likewise, 

c^'/P|S ^ ^ c^^P), this proves that Gp{Vn,Cn) Gp{v,c). □ 

Lemma 3.2. The sequence satisfies an LDP in CP(]R) x R+ with the 

good rate function J defined as follows: for v G (P(]R) and c G M+, let 

(3.3) J{v,c)= sup {[ f{y)v{dy) + tc-log [ p.p{dy)\, 

feCb(R),teR I-'® J 

where C6(M) denotes the space of all bounded continuous functions from M to M. 

Proof. The n-th term in our sequence of interest, S„ = {L^p,mp{L^p)) is the empirical 
mean of the i.i.d. random variables \Pfi= 1,..., n, which take values in 

the Polish space T’(M) x R_|_. To establish the LDP for (5'„)„gN, we will first apply Cramer’s 
theorem for general Polish spaces, as can be found in Theorem 6.1.3 of [9], to show that 
{Sn)neN satisfies a weak LDP wifh rafe funcfion J. To verify fhe condifions of fhaf fhe- 
orem, lef M(R) be fhe space of finife regular Borel measures on R, equipped wifh fhe 
weak topology (i.e., fhe coarsesf topology such fhaf fhe functionals v i—)• J^^dv are con¬ 
tinuous for all 0 G Ci,(R)), and lef X = M(R) x R be endowed wifh fhe producf fopology. 
Nofe fhaf X is a locally convex Hausdorff fopological space (being fhe producf of fwo such 
spaces), (P(R) x R+ is a closed convex subsef of X, and fhe relative topology coincides wifh 
the product topology on (P(R) x R+. This shows that Assumption 6.1.2(a) of [9] is satis¬ 
fied. In view of Remark (b) on p. 253 of [9], to verify fhe remaining condifion, which is 
sfafed in Assumption 6.L2(b) of [9], if suffices to show fhaf fhere exisfs a mefric d{-, ■) on 
T’(R) X R+ (compafible wifh ifs fopology) fhaf safisfies fhe following convexify condifion: 
for all a G [0,1] andxi,X 2 ,yi,y 2 C 3C, 

(3.4) d{axi -h (1 — cc)x2, otyi + (1 — o:)y2) < trrax{d{x\,y\),d{x2,y2)}- 

If is known fhaf fhe convexify condifion (3.4) holds for d\ 2 P, fhe Levy-Prohorov mefric which 
mefrizes weak convergence on T’(R) [9, p. 261], and if is easy to see fhaf if also holds for 
'^Euc^ the Euclidean mefric on R+. Elemenfary calculations show fhaf (3.4) also holds for 
fhe producf mefric d^, defined by 

<ioc((/i,5), (v,f)) = max{dLp{iJ.,v),dEuc{s,t)}, 

for /I, V G (P(R) and G R. 

We can now apply Theorem 6.1.3 of [9] to conclude fhaf {Sn)n€N satisfies a weak EDP 
wifh rafe funcfion 

A*(x) = sup {(A,x) — A(A)}, X G X, 

AgX* 
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where X* is the topological dual of X, (•, •) denotes the duality pairing of X* and X, and 

A(A) =log]E[e<^’^‘>], AeX*. 

To show that A* = / as given in (3.3), note that for X = M(M) x M, 

X* = (M(M) X R)* = (M(R))* X R* ~ ^(R) x R, 


where ~ means isomorphic. That is, for A € X*, there exists a unique (/,t) € C/,(R) x R 
such that for all x = (v,y) G !M(R) x R = X, we have (A,x) = J fdv + ty. Therefore, 


A(A) = logE exp = log 




To strengthen the weak LDP for {Sn)n€N to a full LDP and show that 7 is a good rate func¬ 
tion, by Lemma 1.2.18 of [9], it suffices to show that the sequence {S„)neN is exponentially 
tight, or equivalently, that the sequences {L^p)n^^ and (mp(L^p))„gN are both exponen¬ 
tially tight. However, it follows from Lemma 6.2.6 of [9] that the sequence is 

exponentially tight. Moreover, note that 0 is in the interior of the domain of the log moment 
generating function of Hence, it follows from Corollary 6.1.6 of [9] and Lemma 

2.6 of [15] that the sequence {mp{L^p))n^fq is also exponentially tight. To conclude, the 
sequence {L]^ p,mp{L^„p))n^n is exponentially tight and satisfies a weak LDP, and hence, 
safisfies a full LDP wifh good rale funcfion J. □ 


Lemma 3.3. The good rate function J of (3.3) can be written as follows: for v G 1P(R) 
and c G R+, 


(3.5) 


7(v,c) 


H{v\\l^p) + ^{c-mp{v)) ifmp{v)<c, 
-|-oo else. 


Proof. For t G ^), define /ip ^ G T’(R) as follows: 


(3.6) 


nl!\dy) = 


(i-pty^p 
2pVpr(l + i) 


e-i^-P»\y\'’/Pdy. 


Nofe fhal Pp = Pp^\ and dpp^/dlJ.p = (1 — ptY^Pe^'^^'^^. 

To prove fhe equably (3.5), we apply Ihe Donsker-Varadhan variafional formula for rela- 
live enlropy (see, e.g.. Lemma L4.3(a) of [10]), and use fhe expression (3.6) for Pp \ Note 
fhal for f > |, we have log Pp{dy) = -|-oo, so if suffices fo fake fhe supremum of 

(3.3) over t < Thus, for (v,c) G T(R) x R+, 


7(v,c)= sup <.tc + ^\og{\-pt)+ sup 
r<l/p I /eCi,(R) 


f{y)v{dy)-log j e^^^lpl!\dy) 

i Jr 


= sup 

t<i/p 


{fc-P^log(l -pt)+H{v\\pj!^)y 
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We can rewrite each term in the supremum as follows: 
tc+\\og{l-pt)+H{v\\ix^p^) 

= ,c + j log(l + p(dy) - log (^6-)) v(<lg) 

= tc + ^log{l - pt) +H{v\\pp) - (^^log{l-pt)+tmp{v)J 
= tic-mp{v))+H{v\\pp). 

Thus, if rnp{v) > c, then J{v,c) = +oo; otherwise, if mp{v) < c, then 
J{v,c)=H{v\\pp)+ sup {t{c-mp{v))}, 

t<ilp 

which proves the expression in (3.5). □ 

Proposition 3.4. Let p € and assume ~ 7 „ p. The sequence {L„^p)„(^f^ of 
(2.1) satisfies an LDP in (P(]R) equipped with the weak topology, with the good rate function 
Hp of (2.5). 

Proof. Due to the representation (3.2), the continuity of the map Gp established in 
Lemma 3.1, and the LDP for the sequence (L^p,mp(L^p))„gf^ from Lemma 3.2, the con¬ 
traction principle yields an LDP for (L„_p)„gN with the good rate function 

JIp(v) = inf {7(A,c) : A € T(M),c G M+,Gp(A,c) = v}. 

It remains to show that the rate function Sp is identical to Hp of (2.5). 

Let V G 1P(M). From the definition of Gp in (3.1), we see that if Gp{X,c) = V for some 
A G (P(M) and c G M+, then A = v( • x c^'/^) and mp{X) = cmp{v). Therefore, using the 
representation for /(v,c) in Lemma 3.3, we have 

Jp(v) = inf |7(v(- X c-^Ip),c) : c G M+} 

infc>o|//(v(-xc^'/P)||/rp) + i(c-cmp(v))| ifmp(v)<l, 
y +00 else. 

If mp{v) > 1, then Jp(v) = IHIp(v) = +oo. If mp{v) < 1, then use the fact that 

H{v{-xc-^/P)\\Pp)= [ ^og( ‘^''^Yp'^'’\ x)) v{dxxc^^/P) 

= H{v\\pp{-xc^/P)), 
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to find that 


Jp(v) = inf |//(v||Atp(- X c^/P)) + ^(1 -mp(v))} 




= //(v||Atp) + inf|-ilogc-^mp(v) + ^(l-mp(v))} 
=//(vllAtp) - |mp(v) + I inf {c - logc} 





□ 


Note that Proposition 3.4 is non-trivial because it cannot be obtained via a simple ap¬ 
plication of the contraction principle to the map V i-A {v,mp{v)). Indeed, there does not 
appear to be a standard topology on CP(]R) such that both the sequence {L^p)nen satisfies 
an LDP with good rate function and the map nip{-) is continuous with respect to that topol¬ 
ogy. For example, consider v„ = -|- - 5 ^ 1 /?; then, v„ ^ 5o in the weak topology and 

fiip{Vn) = f for all n, but mp{do) = 0. The same counterexample holds for the t topology 
on T(M) (which is defined analogously to the weak topology, except that the test functions 
in Cfo(M) are replaced by bounded measurable functions). On the other hand, mp{-) is con¬ 
tinuous with respect to the Wp topology, but in this case, (L^p)„gN does not satisfy an LDP 
with respect to the Wp topology. In particular, [14] and [26] develop strong exponential 
integrability conditions that show that while the sequence (L^p)„gN satisfies an LDP with 
good rate function with respect to the Wq topology for all <7 < p, it does not satisfy such an 
LDP for q = p. 

An alternative approach to the contraction principle that is used in the theory of large 
deviations is Varadhan’s lemma, which allows one to transfer LDPs from one sequence of 
probability measures to another sequence which is (termwise) absolutely continuous with 
respect to the former. However, this approach is not applicable in our setting because the 
laws of L„ p and L^p are mutually singular. 

In a different direction, instead of appealing to the LDP for {L^p)neN, we could attempt 
to analyze the LDP for {L„ p)f,^^ directly. In particular, note that {X^^’P\X^^’P\ ...) is an 
infinite triangular array where each row consists of exchangeable random variables, 
so Ln p is the empirical measure of a finite exchangeable sequence. Some large deviations 
results for the empirical measure of a row-wise exchangeable triangular array can be found 
in [24], but on their p. 653, they acknowledge that “Even in the simple case of binary valued 
finite exchangeable random variables there is no general result for the LD behavior of [the 
row-wise empirical measure].” That is, exchangeable structure on its own is not sufficient 
for a general LDP result (and in particular, not sufficient for an LDP for (L„ p)„gf^). 

Yet another approach to the LDP of Theorem 2.8 is through the machinery of a “non- 
continuous” version of the contraction principle developed in [2, §6.2-6.3]. In particular, the 
authors of [2] explicitly state the result for p = 2 in the weak topology. However, Theorem 
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2.5 and Theorem 2.8 improves this result in multiple ways. First, our results are for general 
p G [1,°°]- Second, we have results for both the cone measure and surface measure, whereas 
the p = 2 case is one of only three cases {p = l,2,oo) where the two measures coincide. 
Third, in Lemma 3.3 we provide an explicit proof by computing the rate function J of 
Lemma 3.2 directly from the variational formula given by Cramer’s theorem in 1P(M) x M+, 
instead of appealing to higher level results. Lastly, as shown below, our result extends from 
the weak topology to the stronger Wasserstein topology on T’(M). 


Lemma 3.5. Let Kp = {v G T(]R) : mp{v) < 1}. Then, for all q < p, the set Kp C 
is compact with respect to the topology. In addition, Kp is convex and non-empty. 


Proof. The properties of convexity and non-emptiness are elementary. As for com¬ 
pactness, we first prove that Kp is weakly compact. For v G Kp, for all M > 0, Chebyshev’s 
inequality yields 


v{[-M,MY) < 


mp{v) 

MP 



Thus, Kp is tight, and by Prokhorov’s theorem, precompact. Note that Kp is weakly closed, 
since it is a level set of the map V i—)■ / xPv{dx), which is lower semicontinuous due to the 
Portmanteau theorem for weak convergence. Therefore, Kp is weakly compact. 

To verify Wasserstein compactness, it suffices to show that the set of probability mea¬ 
sures in Kp have uniformly integrable ^-th moments [1, Proposition 7.1.5]. This latter con¬ 
dition follows from the de la Vallee-Poussin theorem, since for g{x) = \x\p/'^ (which satisfies 
the superlinear growth condition limj,|_^oo ^ = oo), we have 


sup / g{\xY)v{dx) = sup mp{v) < 1 < oo. 
v^KpJ'R veKp 

This uniform integrability implies that Kp is W^-compact. □ 


Lemma 3.6. Let p G [1,°°]. Then, Hp of (2.5) is convex. 


Proof. Note that the domain of Up is convex, since mp is linear. Moreover, within its 
domain, the function Hp is the sum of the relative entropy Hf\\pLp), which is convex, and 
the p-th moment mp, which is linear, so Hp is convex. □ 

Proof of Theorem 2.8. Fix ^ < p. In view of the fact that (L„.p)„gp} satisfies an LDP 
in T’(M) with respect to the weak topology due to Proposition 3.4, in order to establish 
the LDP in IP^(M) with respect to the Wasserstein topology, it suffices fo show exponential 
tightness of (L„.p)„gp} in the topology (see Corollary 4.2.6 of [9]). Let Kp be the compact 
set (with respect to topology) defined in Lemma 3.5. Note that mp{Ln,p) = 1 a.s., so 
P(L„.p G A'p = 0 for all n, which yields 

limsup -!-logP(L„_p G Kp) = -°o. 

^_yoo n 
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Thus, {Ln,p)nm is exponentially tight, and hence, satisfies the desired LDP with respect to 
the topology. Finally, the convexity of Hp is given by Lemma 3.6. □ 

4. LDP under the surface measure. In this section, we show how Theorem 2.5, the 
LDP for the surface measure, can be obtained from the LDP for the cone measure. We first 
recall the following fundamental relationship between the cone and surface measure. 


Lemma 4.1 ([18, Lemma 2]). Let p G [1,°°). Then, 


da, 


1/2 




"^(x) = C„,p ^ ^ , XG §„,p, 


o'=i 


where Cn^p is the normalizing constant, 


Cn,p — 


1/2 


n -1 


|2p-2 


Yn,p (dx^ 


^n,p \(=1 


For p = °°, let C„^oo = 1. 


Next, we state a general result about LDPs for two sequences of measures that satisfy a 
particular absolute continuity relation. 


Lemma 4.2. Let Xbe a Polish space, and for n gN, let Yn ^ Moreover, suppose 
that there exists a sequence of finite constants {Mfjn&i satisfying lim„^oo ^ = 0, such that 
for all n € N, 

|V/„(A)|<M„, /r„-a.e. A G X. 

Suppose {jJ-n)net^ C T’(X) satisfies an LDP with a good rate function I(-). Define 
(4.1) Vn{dX) = Pn{dX). 

Then, (v„)„gN satisfies an LDP with the same good rate function !(•). 


Proof. For (/) : X —> M continuous and bounded, define 

= lim -log [ 

n-s.“ n Jx 

where the limit exists due to the LDP for (/i„)„gN and Varadhan’s Lemma (see, e.g.. Theo¬ 
rem 4.3.1 of [9]). Next, note that the bound on t/r„ implies that for every n gN, 

-M„ < log f < M„. 

Jx 

Since M„/n —> 0, this implies 

lim -!-log [ e^"^^^Pn{dX) =0. 
n ^“n Jx 
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Together with the definition (4.1), the bound on Yn, and the assumption onM„, this implies 
that 

liminf-log / > liminf-log / = A 0 — lim— -=A^, 

rt—>oo f2 n^oo fi JjQ ^ n-^oQ fi ^ 

\ r \ C M 

limsup - log / < limsup-log / = A 0 + lim — = A 0 . 

n^oo n Jx n^oo n Jx n 

Thus, we have shown that 

A 0 = A 0 = lim -log [ 

n^«> n Jx 

Note that (/i„)„gN is exponentially tight since it satisfies an LDP wifh good rafe funcfion 
and X is Polish [15, Lemma 2.6]. We claim fhaf {Vn)neN is also exponenfially fighf. Lef 
L < oo, and lef KtCXhe a compacf sef such fhaf 

(4.2) limsup - log IjL„{KI) < —L. 

fl —yoo fl 

Then, given L <oo, nofe fhaf 

log v„{Kl) = log [ e’^"i^V„(r/A)-log / Hr,{dX) 

Jki Jx 

< Mfi + log/r„ {Kl) + M„ — log lJ.n{X) 

= 2Mn + log /r„ (K^) ■ 

Taking fhe limif supremum as n —> oo, using (4.2), and applying fhe facl fhat ^ 0, we 

find fhaf 

limsup-log v„(.R'£) < limsup-log/r„(.K'£) < —L. 

Yl —yoo fl fl —yoo fl 

Since A^ = A^, and bofh sequences {lJLn)nen and (v„)„gN are exponenfially fighf, Bryc’s 
inverse lemma (see, e.g.. Theorem 4.4.2 of [9]) implies fhaf fhe fwo sequences safisfy fhe 
same LDP wifh fhe good rale funcfion I(A) = sup^g^^{0 (A) — A^ }. □ 

We apply fhe preceding lemma lo fhe absolule confinuily relafion of Lemma 4.1 fo prove 
fhe LDP under fhe surface measure. 

Proof of Theorem 2.5. The resull is Irivial for p = l, 2 ,oo, since fhen = 7 „,p. 
Therefore, we resfrict fo p G (1,2) or p G (2,oo). In Ibis selling, we apply Lemma 4.2 lo 
fhe following: lef : M" —)• (P(M) be fhe map fhaf lakes a vector x G M” lo fhe measure 
G T(M), and sef 

• X = CP^(M) (equipped with the ^-Wasserstein topology, for some q < p), 

• — Yn,p , 

• Wn = w= \ logm2p-2. 
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* Vn — ^n,p ^n ' 

We now show that our setup satisfies the hypotheses of Lemma 4.2. For n € N, let 


An-I : Y G 


i=\ 


cy(M). 


Note that for all n G N, UniAn) = 1. For A G A„, using Lemma 4.1, we can write the Radon- 
Nikodym derivative of v„ with respect to /r„ (defined on a sef of /r„ measure 1), as 


dVn /j N ^^n,p 

d^ln dYn,p 


=C„,p(nm2p_2(A))'/^ 




We know from Theorem 2.8 fhaf satisfies an LDP wifh respecf fo fhe Wq topology 

(for q < p), wifh good rafe function Hp. 

As for fhe boundedness properties of t/r = i log m 2 p -2 sfipulafed in Lemma 4.2, firsf nofe 
fhaf due fo Holder’s inequalify, for any 0 < r < s < oo and A G T’(M), 

mr{X) < K(A)]''/L 


Now, fix 1 < p < 2. Then 0 < 2p — 2 < p, and so applying fhe preceding inequalify wifh 
r = 2p — 2 and s = p, and invoking fhe equivalence (2.12) wifh q = p,'we see fhaf for A G A„, 

^n{m2p-2{X) > 1) < > 1) = P (||„,p > l) =0, 

since G S„,p, P-a.s. On fhe ofher hand, fo lower bound nr 2 p- 2 (A), recall fhaf for 0 < 
r <s <°° and x G M", 

Applying fhis inequalify wifh r = 2p — 2 and s = p, and recalling L„.p from (2.1), we have 

— n^-{2/p)^ P-a.s., 

where fhe lasf equalify once again uses fhe facf fhaf P-a.s. (under fhe cone measure), 
lies on fhe unit £p sphere S„,p. In a similar fashion, for 2 < p < oo, we have 2p — 2 > p and 
hence, for A G A„, 

^lnim2p-2{X) < 1) < p„(mp(A)2-(2/p) ^ < l) =0, 

and 

m2p-2{Ln,p) < p.a.s. 



18 


STEVEN SOOJIN KIM AND KAVITA RAMANAN 


Since /i„(A„) = 1, we have shown that for /i„-a.e. A E 1P(M), 

Iv/(A)| = \j\ogm2p-2{^)\ <Mn, 


where 



j_ 

p 


logn. 


Since M„/n —>■ 0, we can apply Lemma 4.2, which shows that the sequence of empirical 
measures {L„,p)neN under the surface measure a„^p satisfies an LDP with the same good 
rate function as under the cone measure jn^p. □ 


5. Gibbs conditioning. We first recall a general version of the Gibbs conditioning 
principle. The following theorem is one of the results of [13]. For the sake of complete¬ 
ness, and also due to the fact that our statement below imposes slightly different conditions 
than the result in [13] (although the proof is essentially unaffected), we include a short 
proof. 


Proposition 5.1 (Gibbs conditioning principle, [13, Theorem 7.1]). Let Xbe a topo¬ 
logical space, and let be a sequence ofX-valued random variables that satisfies an 

LDP with good rate function I. In addition, let F d Xbe a subset such that 

1. I{F)= inf;ceF 

2. F is closed; 

3. F = r\e>0^e a family of sets (fe)e>o such that Fg C Xfor all e > 0 and P(i§„ E 
Fe) > Ofor all e > 0 and n E N; 

4. F C {Fs)° for all E > 0. 

Let be the set ofx E F which minimize I. That is, 

Mf = {a E F : I(x) = 1(F)}. 

Then, for all open G CX such that Mf C G, we have 

(5.1) limsuplimsup -!-logP(i^„ 0 G1E Fg) < 0. 

e—>0 n^oo fl 

As a consequence, j/Mf = {x} is a singleton, then 

(5.2) limlimP(§„E-|§„EFg) = 4. 

g->0«— 

Proof. Due to the LDP for {^n)n€N^ for £ > 0, with F, Fg and G as in the statement 
of the proposition, and G‘^ representing the complement of G, 

limsup-!-logP(i^„ E E Fg) < limsup-!-logP(i§„ E G'^GFg) — liminf-!-logP(i§„ E (Fg)°) 

fl —yco fl —yoo ft f2 

<-i(G^n:^) + i((Fg)°) 

<-I{G^nTe)+I{F), 
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where the last inequality follows from the assumption that F C (Fs)°. Furthermore, the 
assumptions of the proposition state that I is a good rate function and imply the equality 
ne>o(G‘^FlFg) = C^nF. Therefore, by Lemma4.1.6 of [9], 

limI(G"'nF7) =I(G''nF). 

e->0 

Then, (5.1) follows from the last two displays and the assumption that Mf (the set of mini¬ 
mi zers of I in F) lies within G, which implies that 

-I(G" nF) + I(F)< -I(G^ n F) + I(G" n F) = 0. 

Note that (5.1) implies that conditional on Fg, the law of concentrates on the set Mf (as 
n ^oo and e —)• 0), in the sense that for every open set G such that Mf C G, we have 

lim lim P(i§„ € G|i§„ € Fg) = 1. 

e—!>0n— 

As this holds for all such G, the limit (5.2) follows as a simple consequence if Mf is a 
singleton. □ 

Remark 5.2. The conditions on F and Fg of Proposition 5.1 are not the only condi¬ 
tions under which such a result holds. For example, in the case where is the empirical 
measure of n i.i.d. random variables, it is possible to replace condition 4. with a different 
condition on Fg (see, e.g., [9, p. 324, A-1]). 

Our proof of the conditional limit theorem stated in Theorem 2.12 follows from the Gibbs 
conditioning principle of Proposition 5.1. In our proof, it is essential that we appeal to the 
LDPs of Theorem 2.5 and Theorem 2.8, which hold in a Wasserstein topology, which is 
stronger than the usual weak topology. 

Proof of Theorem 2.12. First, we show that Proposition 5.1 applies in the setting of 
Theorem 2.12. Fix p € [1,°°] and q < p. Due to Theorem 2.8 (for the cone measure) and 
Theorem 2.5 (for the surface measure), the sequence {Ln.p)nim satisfies an LDP in 
(equipped with the topology) with good rate function Hp. Fix C = [a,j8] C M and for 
e > 0, define fhe sefs 

Fg = {V E T^(M) : mq{v) E Cg}, 

where Cg = [a — e, j8 -|- e]. If is immediafe from Definition 2.3 fhaf for q < p, the moment 
map niq of (2.3) is continuous on IP^(M), with respect to the topology. Since the set Cg 
is closed, this implies that Fg is closed, and hence, the set 

F = n Fg = {V E T(M) : m,(v) E C} 

£>0 

is also closed. Moreover, for £ > 0, we also have 

F = mq\C) C mq\Cl) C K'(Cg)]° = (Fg)°, 
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where the second inclusion makes another use of the continuity of niq. Next, let us show 
that is a singleton (i.e., v* of (2.9) is well defined). Recall that 

= |v e F : BIp(v) = minEIp(v) 

Note that F is closed and convex, since it is the preimage of a closed, convex set C under a 
continuous linear map niq. Because Hp is lower semicontinuous and has compact level sets, 
it achieves its minimum within F. That the minimum is achieved at a unique V* E F follows 
from the fact that Hp is strictly convex because it is the sum of the strictly convex relative 
entropy FI{-\\pLp) and a linear function nip. The representation for v* given in (2.9) follows 
from (2.8) and the definition (2.5) of Hp. Thus, (2.10) follows from (5.2) of Proposition 5.1. 

As for the second result (2.11), this follows from Proposition 2.2 of [23], which, under 
the assumption of exchangeability, establishes the equivalence of statements like (2.10) (re¬ 
garding convergence in probability of the empirical measure to a deterministic measure) 
and statements like (2.11) (regarding joint convergence in distribution of any fixed k of fhe 
underlying random variables). To be precise, lef X be a Polish space, and for n E N, lef 
be a sequence of n exchangeable X-valued random variables. Then, Propo¬ 
sition 2.2 of [23] sfafes fhaf fhe empirical measure converges in law fo some 

V E T(X) if and only if fhe law of (i§„i,..., ^„k) converges weakly fo fhe k-fold producf of 
V. Given fhaf is an exchangeable sequence, if is still exchangeable condi¬ 
tional on fhe evenf ECe},because ||„, q is a symmefric function 

of fhe random variables i= 1, • • • ,n. Since (2.10) esfablishes convergence in law of 

fhe condifioned empirical measure L„,p fo v*, fhe joinf convergence of (2.11) follows. The 
preceding discussion essentially also ouflines fhe approach followed in Corollary 7.3.5 of 
[9], which sfafes a Gibbs condifioning resulf for fhe empirical measure of i.i.d. random 
variables, buf in facf fhe i.i.d. assumption is only invoked fo esfablish exchangeabilify. □ 

As a prerequisife for fhe proof of Corollary 2.13, recall fhe following basic informafion- 
fheorefic facf. 


Lemma 5.3. Fix r, : M ^ M, ot,- E M, for i = I,... ,m, and Sj : 
j = I,... ,n. Consider the following maximization problem: 


j8/ E M, for 


maximize hiv) 

V€T(K) 


(5.3) 


subjecf fo / ri{x)v{dx) = a,- for / = 1,... ,m, 
■JR 

/ Sj{x)v{dx) < pj for j = l,...,n. 


Then, a probability measure V* E IP(M) attains the maximum in (5.3) if and only if it is of 
the following form: 

( m n \ 

-1 - - J^^X*ri{x) - p*sj(x)j dx, 
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with non-negative constants K^, and chosen such that V* lies in J’(M) and 

satisfies the constraints in (5.3). Moreover, ifv,, attains the maximum in (5.3), then 


(5.4) 



= 0 


for all j = I,... ,n. 


The preceding lemma is standard. See [6, §12.1] for a slight simplification of (5.3) with 
only equality constraints. Alternatively, see [5, Ex. 5.3] for a version of (5.3) with discrete 
entropy and both equality and inequality constraints. The claim (5.4) is the so-called “com¬ 
plementary slackness” condition (see, e.g., [5, §5.5.2]). 

As a final preliminary, define fhe following family of probability measures for ^ € [1,°°), 

j8 >0: 


(5.5) 


ifix^ 


2(^^)i/?r(i + i; 




X G M. 


Nofe fhaf ptpj corresponds fo of (2.4). 


Proof of Corollary 2.13. Due lolhe unconditional limillheoremLemma 2.11, un¬ 
der eifher fhe surface measure or fhe cone measure, we have => /r, where < fip q 
of (2.13), and Hq p is as in (5.5). Therefore, if suffices to show that under either the surface 
measure or the cone measure, the conditional distribution Xi.p\q,£ ^Iso converges weakly 
to the same limit In view of (2.11) of Theorem 2.12, it suffices fo show fhaf when 
C = [0,j8], fhe unique maximizer v* of (2.9) satisfies v* = Hq p. Nofe fhaf due fo fhe basic 
maximum enfropy calculations of Lemma 5.3, we have for j8 > 0, 


(5.6) 


IJ-q.p = argmax{/j(v) : m^(v) < j8}. 


To show fhaf Hq p = v*, if suffices fo show fhaf mp{}Xq p) < 1. Affer some elemenfary cal¬ 
culations, we find thaf since j3 < fip^q, and using fhe expression for fip q in (2.13), 


mp{]Xq.p)=^^I^Pl‘^ 



1 




qPl‘i 



= 1 . 


□ 


Remark 5.4. To undersfand fhe reasoning behind fhe choice of fip^q of (2.13), note 
fhaf fhe primary criferion for an inferval C in Theorem 2.12 fo yield an insighlful resulf 
is for C fo induce an optimizing measure v* of (2.9) wifh an explicif and familiar form. 
Lor example, fhe explicif solution fo fhe simpler variational problem (5.6) holds for all 
j8 > 0. However, fhe original variational problem of (2.9) involves nol only fhe ^-fh momenf 
consfrainf mq{v) G C, buf also an additional consfrainf on fhe p-fh momenf, mp{v) < 1. To 
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simplify the variational problem (2.9), it suffices to consider values of j3 small enough 
such that p) < 1, so that the maximizer of (5.6) is also the maximizer of (2.9) when 
C = [0,j3]. Elementary calculations show that 

r(£±i) 

(5.7) = 1. 

That is, is the threshold value such that for j 8 < ^p^q, we have mp{iiq^) < 1. Note 
however, that this discussion of j5p^q (and thus. Corollary 2.13) is valid only for p € [1,°°), 
since for p = oo, we have moo(/l^) = oo for all ^ < oo and p >0. That is, for q <°°, there is 
no value of > 0 such that an analog of (5.7) can hold for p = oo. 

Remark 5.5. The preceding remark addresses the case of “small enough” p. We now 
consider “large” p: suppose p > Ppq = niq^Pp). Note that an easy consequence of Lemma 
5.3 is that 

(5.8) Pp = argmax{/j(v) : mp{v) < 1}. 

For p — ^pq ^ ~ Ihe ( 7 -th moment constraint of (2.9) is automatically satisfied 
by V = Pp, so the maximizer of (5.8) is also the maximizer of (2.9). In this case, the “condi¬ 
tional” limit of Theorem 2.12 is equivalent to the “unconditional” limit of Lemma 2.11 . In 
other words, for sufficiently large j3, the norm conditioning event of (2.14) is extraneous. 

Remark 5.6. As for C = [0,j3] and pp^q < p < pp^, this regime is less amenable to 
an immediate geometric interpretation. Whereas the small p regime of Corollary 2.13 and 
Remark 5.4 allows an sphere interpretation via the measure Pq^p, and the large p regime 
of Remark 5.5 allows an sphere interpretation via the measure Pp, we have different 
behavior in the intermediate p regime. Recall that the maximum entropy considerations of 
Lemma 5.3 imply that the unique maximizer of Theorem 2.12 is of the form 

(5.9) v^{dx) = exp(-l -Kq- Kp\x\P - , 

with constants fCo, Kp, Kq such that v* is a probability measure that satisfies mp{v^:) < 1 and 
wig(v*) < p. To gain insight on these constants, consider the following cases: 

• if Kp = 0 and Kq > 0, then the complementary slackness condition of (5.4) implies 
niqiy^,) = p, so the form (5.9) implies v* = Pq p. But this is not a feasible solution, 
since mp{pq p) > 1 for j 8 > j 8 p,^; 

• similarly, if fCp > 0 and Kq = 0 , then we have v* = Pp, but this is not a feasible solution 
since m^(Pp) = Pp^q > P', 

• lastly, if Kp = 0 and Kq = 0, then v* is not a probability measure for any choice of Kq. 

Therefore, it must be the case that Kp > 0 and Kq > 0, which implies that v* of (5.9) is 
not of the form Pph for any r G [1 ,00), b>0. Instead, V* is of an exponential family that is 
genuinely different from the generalized normal distributions. 
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In this paper, we have only discussed applications of Theorem 2.12 to intervals C of the 
form C = [0, j8], and primarily for small p, with some discussion of larger p in Remark 5.5 
and Remark 5.6. We leave open the question of finding other examples of intervals C C M 

which lead to an explicit and meaningful maximizing measure v*. 
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